智能论文笔记

A Comprehensive Gold Standard and Benchmark for Comics Text Detection and Recognition

Gürkan Soykan , Deniz Yuret , Tevfik Metin Sezgin

分类：自然语言处理 | 人工智能

2022-12-27

This study focuses on improving the optical character recognition (OCR) data for panels in the COMICS dataset, the largest dataset containing text and images from comic books. To do this, we developed a pipeline for OCR processing and labeling of comic books and created the first text detection and recognition datasets for western comics, called "COMICS Text+: Detection" and "COMICS Text+: Recognition". We evaluated the performance of state-of-the-art text detection and recognition models on these datasets and found significant improvement in word accuracy and normalized edit distance compared to the text in COMICS. We also created a new dataset called "COMICS Text+", which contains the extracted text from the textboxes in the COMICS dataset. Using the improved text data of COMICS Text+ in the comics processing model from resulted in state-of-the-art performance on cloze-style tasks without changing the model architecture. The COMICS Text+ dataset can be a valuable resource for researchers working on tasks including text detection, recognition, and high-level processing of comics, such as narrative understanding, character relations, and story generation. All the data and inference instructions can be accessed in https://github.com/gsoykan/comics_text_plus.

translated by 谷歌翻译

A Reinforcement Learning Approach to Optimize Available Network Bandwidth Utilization

Hasibul Jamil , Elvis Rodrigues , Jacob Goldverg , Tevfik Kosar

分类：人工智能

2022-11-22

Efficient data transfers over high-speed, long-distance shared networks require proper utilization of available network bandwidth. Using parallel TCP streams enables an application to utilize network parallelism and can improve transfer throughput; however, finding the optimum number of parallel TCP streams is challenging due to nondeterministic background traffic sharing the same network. Additionally, the non-stationary, multi-objectiveness, and partially-observable nature of network signals in the host systems add extra complexity in finding the current network condition. In this work, we present a novel approach to finding the optimum number of parallel TCP streams using deep reinforcement learning (RL). We devise a learning-based algorithm capable of generalizing different network conditions and utilizing the available network bandwidth intelligently. Contrary to rule-based heuristics that do not generalize well in unknown network scenarios, our RL-based solution can dynamically discover and adapt the parallel TCP stream numbers to maximize the network bandwidth utilization without congesting the network and ensure fairness among contending transfers. We extensively evaluated our RL-based algorithm's performance, comparing it with several state-of-the-art online optimization algorithms. The results show that our RL-based algorithm can find near-optimal solutions 40% faster while achieving up to 15% higher throughput. We also show that, unlike a greedy algorithm, our devised RL-based algorithm can avoid network congestion and fairly share the available network resources among contending transfers.

translated by 谷歌翻译

ISNAS-DIP: Image-Specific Neural Architecture Search for Deep Image Prior

Metin Ersin Arican , Ozgur Kara , Gustav Bredell , Ender Konukoglu

分类：计算机视觉

2021-11-27

最近的作品表明，卷积神经网络（CNN）架构具有朝向较低频率的光谱偏压，这已经针对在之前（DIP）框架中的深度图像中的各种图像恢复任务而被利用。归纳偏置的益处网络施加在DIP框架中取决于架构。因此，研究人员研究了如何自动化搜索来确定最佳性能的模型。然而，常见的神经结构搜索（NAS）技术是资源和时间密集的。此外，最佳性能的模型是针对整个图像的整个数据集而不是为每个图像独立地确定，这将是非常昂贵的。在这项工作中，我们首先表明DIP框架中的最佳神经结构是依赖于图像的。然后利用这种洞察力，我们提出了一种特定于DIP框架的图像特定的NAS策略，其需要比典型的NAS方法大得多，有效地实现特定于图像的NA。对于给定的图像，噪声被馈送到大量未训练的CNN，并且它们的输出的功率谱密度（PSD）与使用各种度量的损坏图像进行比较。基于此，选择并培训了一个小型的图像特定架构，以重建损坏的图像。在这种队列中，选择重建最接近重建图像的平均值的模型作为最终模型。我们向拟议的战略证明（1）证明其在NAS数据集上的表现效果，该数据集包括来自特定搜索空间（2）的500多种模型，在特定的搜索空间（2）上进行了广泛的图像去噪，染色和超级分辨率任务。我们的实验表明，图像特定度量可以将搜索空间减少到小型模型队列，其中最佳模型优于电流NAS用于图像恢复的方法。

translated by 谷歌翻译

A transformer-based deep learning approach for classifying brain metastases into primary organ sites using clinical whole brain MRI

Qing Lyu , Sanjeev V. Namjoshi , Emory McTyre , Umit Topaloglu , Richard Barcus , Michael D. Chan , Christina K. Cramer , Waldemar Debinski , Metin N. Gurcan , Glenn J. Lesser

分类：计算机视觉

2021-10-07

脑转移性疾病的治疗决策依赖于主要器官位点的知识，目前用活组织检查和组织学进行。在这里，我们开发了一种具有全脑MRI数据的准确非侵入性数字组织学的新型深度学习方法。我们的IRB批准的单网回顾性研究由患者（n = 1,399）组成，提及MRI治疗规划和伽马刀放射牢房超过19年。对比增强的T1加权和T2加权流体减毒的反转恢复脑MRI考试（n = 1,582）被预处理，并输入肿瘤细分，模态转移和主要部位分类的建议深度学习工作流程为五个课程之一（肺，乳腺，黑色素瘤，肾等）。十倍的交叉验证产生的总体AUC为0.947（95％CI：0.938,0.955），肺类AUC，0.899（95％CI：0.884,0.915），乳房类AUC为0.990（95％CI：0.983,0.997），黑色素瘤ACAC为0.882（95％CI：0.858,0.906），肾类AUC为0.870（95％CI：0.823,0.918），以及0.885的其他AUC（95％CI：0.843,0.949）。这些数据确定全脑成像特征是判别的，以便准确诊断恶性肿瘤的主要器官位点。我们的端到端深度射出方法具有巨大的分类来自全脑MRI图像的转移性肿瘤类型。进一步的细化可以提供一种无价的临床工具，以加快对精密治疗和改进的结果的原发性癌症现场鉴定。

translated by 谷歌翻译

Deep Unfolding of Iteratively Reweighted ADMM for Wireless RF Sensing

Udaya S. K. P. Miriya Thanthrige , Peter Jung , Aydin Sezgin

分类：计算机视觉 | 机器学习 | (统计)机器学习

2021-06-07

我们通过基于压缩感测和多输出（MIMO）无线雷达来解决材料缺陷的检测，这些材料缺陷在层状材料结构内部。这里，由于层状结构的表面的反射导致的强杂波通常经常使缺陷挑战的缺陷。因此，需要改进的缺陷检测所需的复杂信号分离方法。在许多情况下，我们感兴趣的缺陷的数量是有限的，并且分层结构的信令响应可以被建模为低秩结构。因此，我们提出了对缺陷检测的关节等级和稀疏最小化。特别是，我们提出了一种基于迭代重量的核和$ \ ell_1- $规范（一种双重重量方法）的非凸法方法，与传统的核规范和$ \ ell_1- $常态最小化相比获得更高的准确性。为此，迭代算法旨在估计低级别和稀疏贡献。此外，我们建议深入学习来学习算法（即，算法展开）的参数，以提高算法的准确性和汇聚速度。我们的数值结果表明，该方法在恢复的低级别和稀疏组分的均方误差和收敛速度方面优于常规方法。

translated by 谷歌翻译